Distance-based adaptive k-neighborhood selection

نویسندگان

  • Albrecht Zimmermann
  • KU Leuven
چکیده

The k-nearest neighbor classifier follows a simple, yet powerful algorithm: collect the k data points closest to an unlabeled instance, according to a given distance measure, and use them to predict that instance’s label. The two components, the parameter k governing the size of used neighborhood, and the distance measure, essentially determine success or failure of the classifier. In this work, we propose to reverse the use of outlier-detection techniques that are based on k-neighborhoods in order to determine the value of k. To achieve this, we invert the workings of these techniques: instead of using a fixed k to decide whether an instance is an outlier, we stop growing the k-neighborhood as soon as the unlabeled instance would be given outlier status. We derive a number of criteria from different neighborhood-based outlier detection techniques. With the exception of one technique, our approaches have low complexity and running times. In our experiments, we compare against two recently proposed techniques from the field that are have more sophisticated theoretical foundations, as well as against two well-established kNN classifiers. We find that our approaches are competitive with existing work and especially that the recent techniques do not constitute an improvement. CR Subject Classification : I.2, H.2.8 Distance-based adaptive k-neighborhood selection Albrecht Zimmermann [email protected] KU Leuven, Celestijnenlaan 200A, Leuven, B-3001 Belgium Abstract. The k-nearest neighbor classifier follows a simple, yet powerful algorithm: collect the k data points closest to an unlabeled instance, according to a given distance measure, and use them to predict that instance’s label. The two components, the parameter k governing the size of used neighborhood, and the distance measure, essentially determine success or failure of the classifier. In this work, we propose to reverse the use of outlier-detection techniques that are based on k-neighborhoods in order to determine the value of k. To achieve this, we invert the workings of these techniques: instead of using a fixed k to decide whether an instance is an outlier, we stop growing the k-neighborhood as soon as the unlabeled instance would be given outlier status. We derive a number of criteria from different neighborhood-based outlier detection techniques. With the exception of one technique, our approaches have low complexity and running times. In our experiments, we compare against two recently proposed techniques from the field that are have more sophisticated theoretical foundations, as well as against two well-established kNN classifiers. We find that our approaches are competitive with existing work and especially that the recent techniques do not constitute an improvement. The k-nearest neighbor classifier follows a simple, yet powerful algorithm: collect the k data points closest to an unlabeled instance, according to a given distance measure, and use them to predict that instance’s label. The two components, the parameter k governing the size of used neighborhood, and the distance measure, essentially determine success or failure of the classifier. In this work, we propose to reverse the use of outlier-detection techniques that are based on k-neighborhoods in order to determine the value of k. To achieve this, we invert the workings of these techniques: instead of using a fixed k to decide whether an instance is an outlier, we stop growing the k-neighborhood as soon as the unlabeled instance would be given outlier status. We derive a number of criteria from different neighborhood-based outlier detection techniques. With the exception of one technique, our approaches have low complexity and running times. In our experiments, we compare against two recently proposed techniques from the field that are have more sophisticated theoretical foundations, as well as against two well-established kNN classifiers. We find that our approaches are competitive with existing work and especially that the recent techniques do not constitute an improvement.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Geometry-Aware Neighborhood Search for Learning Local Models for Image Reconstruction

Local learning of sparse image models has proven to be very effective to solve inverse problems in many computer vision applications. To learn such models, the data samples are often clustered using the K-means algorithm with the Euclidean distance as a dissimilarity metric. However, the Euclidean distance may not always be a good dissimilarity measure for comparing data samples lying on a mani...

متن کامل

The Time Adaptive Self Organizing Map for Distribution Estimation

The feature map represented by the set of weight vectors of the basic SOM (Self-Organizing Map) provides a good approximation to the input space from which the sample vectors come. But the timedecreasing learning rate and neighborhood function of the basic SOM algorithm reduce its capability to adapt weights for a varied environment. In dealing with non-stationary input distributions and changi...

متن کامل

Parameterless Isomap with Adaptive Neighborhood Selection

Isomap is a highly popular manifold learning and dimensionality reduction technique that effectively performs multidimensional scaling on estimates of geodesic distances. However, the resulting output is extremely sensitive to parameters that control the selection of neighbors at each point. To date, no principled way of setting these parameters has been proposed, and in practice they are often...

متن کامل

An Adaptive LEACH-based Clustering Algorithm for Wireless Sensor Networks

LEACH is the most popular clastering algorithm in Wireless Sensor Networks (WSNs). However, it has two main drawbacks, including random selection of cluster heads, and direct communication of cluster heads with the sink. This paper aims to introduce a new centralized cluster-based routing protocol named LEACH-AEC (LEACH with Adaptive Energy Consumption), which guarantees to generate balanced cl...

متن کامل

A Robust Competitive Global Supply Chain Network Design under Disruption: The Case of Medical Device Industry

In this study, an optimization model is proposed to design a Global Supply Chain (GSC) for a medical device manufacturer under disruption in the presence of pre-existing competitors and price inelasticity of demand. Therefore, static competition between the distributors’ facilities to more efficiently gain a further share in market of Economic Cooperation Organization trade agreement (ECOTA) is...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013